Skip to content

Conversation

@thinknoack
Copy link
Contributor

@thinknoack thinknoack commented Jun 27, 2025

refactor of how contributor/(author) information is sourced, processed, and displayed across the application by prioritizing datacite.yml,

Notes

  • contributor resolver
  • two contributor Display Components - links to orcid and ON user datasets if possible
  • New specs for contributor component in app/users and server/datalad

@thinknoack thinknoack marked this pull request as draft June 27, 2025 00:42
@codecov
Copy link

codecov bot commented Jun 27, 2025

Codecov Report

❌ Patch coverage is 82.11765% with 76 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.86%. Comparing base (89c0241) to head (512d415).
⚠️ Report is 50 commits behind head on master.

Files with missing lines Patch % Lines
packages/openneuro-server/src/datalad/creators.ts 88.26% 27 Missing ⚠️
packages/openneuro-server/src/utils/orcid-utils.ts 4.76% 20 Missing ⚠️
packages/openneuro-server/src/graphql/schema.ts 0.00% 14 Missing ⚠️
...ckages/openneuro-app/src/scripts/users/creator.tsx 89.23% 7 Missing ⚠️
.../scripts/search/components/SearchResultDetails.tsx 20.00% 4 Missing ⚠️
...euro-app/src/scripts/search/use-search-results.tsx 60.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3510      +/-   ##
==========================================
+ Coverage   48.41%   48.86%   +0.44%     
==========================================
  Files         593      605      +12     
  Lines       42150    43081     +931     
  Branches     1388     1444      +56     
==========================================
+ Hits        20409    21052     +643     
- Misses      21572    21858     +286     
- Partials      169      171       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@thinknoack thinknoack requested a review from nellh July 1, 2025 00:21
@thinknoack thinknoack changed the title WIP adding new resolver contributors/datacite.yml Adding new resolver contributors/datacite.yml Jul 1, 2025
@thinknoack thinknoack self-assigned this Jul 2, 2025
Copy link
Contributor

@nellh nellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a deeper look at the specification, I found a few issues with how we've approached this. The implementation here looks good but I think there's a few changes required to align with the current datacite metadata schema.

https://datacite-metadata-schema.readthedocs.io/_/downloads/en/4.6/pdf/

)
}
if (authors.length) {
if (authors.length) { // TODO - NELL - this was switched to contributors - is that correct?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we want to match contributors once that's available.

const dataciteData = await dataciteCache.get(() =>
getDataciteYml(datasetId, revision)
)
if (dataciteData && Array.isArray(dataciteData.authors)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like our example here was actually not very representative of a correct implementation.

Digging deeper into 4.6, the authors data is actually part of creators:

https://schema.datacite.org/meta/kernel-4.6/

Here's a better example of structured author metadata.

creators:
  - creatorName:
      nameType: Personal
      value: Smith, John
    givenName: John
    familyName: Smith
    nameIdentifiers:
      - nameIdentifierScheme: ORCID
        schemeUri: https://orcid.org
        nameIdentifier: https://orcid.org/0000-0002-1825-0097

There is also a contributor field that is meant for non-author roles.

Comment on lines 167 to 195
if (!parsedContributors || parsedContributors.length === 0) {
const descriptionJsonCache = new CacheItem(
redis,
CacheType.datasetDescription,
[datasetId, revisionShort],
)
try {
const datasetDescriptionJson = await descriptionJsonCache.get(() =>
getDescriptionObject(datasetId, revision).then(
(uncachedDescription) => ({
id: revision,
...uncachedDescription,
}),
)
)
if (
datasetDescriptionJson &&
Array.isArray(datasetDescriptionJson.Authors)
) {
parsedContributors = normalizeBidsAuthors(
datasetDescriptionJson.Authors,
)
Sentry.captureMessage(
`Loaded contributors from dataset_description.json for ${datasetId}:${revision}`,
)
}
} catch (error) {
Sentry.captureException(error)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better if it calls the description resolver that exists to avoid reimplementing parts of it.

# Single list of files to download this snapshot (only available on snapshots)
downloadFiles: [DatasetFile]
# Authors list from datacite.yml || dataset_description.json
contributors: [Contributor]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, since datacite.yml makes the distinction between creators and contributors, we might want to call this creator so we can use contributors later (for datacite.yml contributor fields) instead of naming the creators field as contributors in our API schema. That is likely to be confusing if we want to use both fields and it looks like the correct way of adding things like curation contributors is the optional contributors field over adding them as authors (creators in datacite.yml).

/**
* Attempts to read and parse datacite.yml.
*/
const getDataciteYml = async (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more error case to consider is this:

resourceType:
  resourceTypeGeneral: JournalArticle

Anything other than dataset in this field means that it is probably an unrelated datacite.yml. It's not that unlikely to annotated the software or article like this, so we should check that resourceTypeGeneral is Dataset.

…o use creators, uses new datacite schema (4.6) for parsing and also checks that resourceTypeGeneral is Dataset before parsing, otherwise fallback uses description resolver.
@thinknoack thinknoack changed the title Adding new resolver contributors/datacite.yml WIP Adding new resolver contributors/datacite.yml Jul 8, 2025
@thinknoack thinknoack requested a review from nellh July 8, 2025 23:24
thinknoack added a commit that referenced this pull request Aug 11, 2025
… - needs hook up with #3510(or future update)  and datacite.yml
@thinknoack thinknoack changed the title WIP Adding new resolver contributors/datacite.yml WIP Adding new resolver creators/datacite.yml Aug 13, 2025
@thinknoack thinknoack closed this Aug 14, 2025
@thinknoack thinknoack deleted the feature/dataset-contributors-resolver branch August 14, 2025 19:48
@thinknoack thinknoack restored the feature/dataset-contributors-resolver branch August 14, 2025 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants